AITopics | Sendai

We present an ultra-fast and flexible search algorithm that enables search over trillion-scale natural language corpora in under 0.3 seconds while handling semantic variations (substitution, insertion, and deletion). Our approach employs string matching based on suffix arrays that scales well with corpus size. To mitigate the combinatorial explosion induced by the semantic relaxation of queries, our method is built on two key algorithmic ideas: fast exact lookup enabled by a disk-aware design, and dynamic corpus-aware pruning. We theoretically show that the proposed method suppresses exponential growth in the search space with respect to query length by leveraging statistical properties of natural language. In experiments on FineWeb-Edu (Lozhkov et al., 2024) (1.4T tokens), we show that our method achieves significantly lower search latency than existing methods: infini-gram (Liu et al., 2024), infini-gram mini (Xu et al., 2025), and SoftMatcha (Deguchi et al., 2025). As a practical application, we demonstrate that our method identifies benchmark contamination in training corpora, unidentified by existing approaches. We also provide an online demo of fast, soft search across corpora in seven languages.

large language model, machine learning, pattern recognition, (25 more...)

arXiv.org Machine Learning

2602.10908

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Austria > Styria > Graz (0.04)
Europe > Austria > Vienna (0.04)
(17 more...)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Leisure & Entertainment > Sports > Olympic Games (0.95)
Health & Medicine > Therapeutic Area > Immunology (0.92)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(5 more...)

Add feedback

Tactical Optimismand Pessimismfor Deep Reinforcement Learning

Neural Information Processing SystemsFeb-9-2026, 05:16:45 GMT

artificial intelligence, learning, reinforcement learning, (10 more...)

Neural Information Processing Systems

Country:

Europe > Austria > Vienna (0.14)
Asia > Middle East > Jordan (0.05)
North America > United States > California > Alameda County > Berkeley (0.04)
(4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

PARD: Permutation-invariantAutoregressiveDiffusion forGraphGeneration

Neural Information Processing SystemsFeb-7-2026, 20:53:21 GMT

Specifically, we show that contrary to sets, elements in a graph are not entirely unordered and there is a unique partial order for nodes and edges. With this partial order,PARD generates a graph in a block-by-block, autoregressivefashion, where each block'sprobability isconditionally modeled by a shared diffusion model with an equivariant network.

artificial intelligence, graph, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > Japan > Honshū > Tōhoku > Miyagi Prefecture > Sendai (0.04)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Data-Driven Global Sensitivity Analysis for Engineering Design Based on Individual Conditional Expectations

Palar, Pramudita Satria, Saves, Paul, Regis, Rommel G., Shimoyama, Koji, Obayashi, Shigeru, Verstaevel, Nicolas, Morlier, Joseph

arXiv.org Machine LearningDec-16-2025

Explainable machine learning techniques have gained increasing attention in engineering applications, especially in aerospace design and analysis, where understanding how input variables influence data-driven models is essential. Partial Dependence Plots (PDPs) are widely used for interpreting black-box models by showing the average effect of an input variable on the prediction. However, their global sensitivity metric can be misleading when strong interactions are present, as averaging tends to obscure interaction effects. To address this limitation, we propose a global sensitivity metric based on Individual Conditional Expectation (ICE) curves. The method computes the expected feature importance across ICE curves, along with their standard deviation, to more effectively capture the influence of interactions. We provide a mathematical proof demonstrating that the PDP-based sensitivity is a lower bound of the proposed ICE-based metric under truncated orthogonal polynomial expansion. In addition, we introduce an ICE-based correlation value to quantify how interactions modify the relationship between inputs and the output. Comparative evaluations were performed on three cases: a 5-variable analytical function, a 5-variable wind-turbine fatigue problem, and a 9-variable airfoil aerodynamics case, where ICE-based sensitivity was benchmarked against PDP, SHapley Additive exPlanations (SHAP), and Sobol' indices. The results show that ICE-based feature importance provides richer insights than the traditional PDP-based approach, while visual interpretations from PDP, ICE, and SHAP complement one another by offering multiple perspectives.

ice curve, interaction, pdp, (14 more...)

arXiv.org Machine Learning

2512.11946

Country:

Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
North America > United States > Colorado > Jefferson County > Golden (0.04)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
(3 more...)

Genre: Research Report > New Finding (0.48)

Industry: Energy > Renewable > Wind (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Storage capacity of perceptron with variable selection

Xu, Yingying, Ohzeki, Masayuki, Kabashima, Yoshiyuki

arXiv.org Machine LearningDec-2-2025

A central challenge in machine learning is to distinguish genuine structure from chance correlations in high-dimensional data. In this work, we address this issue for the perceptron, a foundational model of neural computation. Specifically, we investigate the relationship between the pattern load $α$ and the variable selection ratio $ρ$ for which a simple perceptron can perfectly classify $P = αN$ random patterns by optimally selecting $M = ρN$ variables out of $N$ variables. While the Cover--Gardner theory establishes that a random subset of $ρN$ dimensions can separate $αN$ random patterns if and only if $α< 2ρ$, we demonstrate that optimal variable selection can surpass this bound by developing a method, based on the replica method from statistical mechanics, for enumerating the combinations of variables that enable perfect pattern classification. This not only provides a quantitative criterion for distinguishing true structure in the data from spurious regularities, but also yields the storage capacity of associative memory models with sparse asymmetric couplings.

perceptron, storage capacity, variable selection, (14 more...)

arXiv.org Machine Learning

2512.01861

Country:

North America > United States (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
Europe > Finland > Uusimaa > Helsinki (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.83)

Add feedback

Portuguese Man O'War species honors 'One-Eyed Dragon' samurai

Popular ScienceNov-3-2025, 17:01:00 GMT

The newly discovered P. mikazuki is a tribute the famous warrior Date Masamune. Breakthroughs, discoveries, and DIY tips sent every weekday. A team of university students in Japan identified an entirely new species of the mighty Portuguese Man O'War . Described in a study recently published in the journal, the creature's distinct features and fearsome venom have earned it a name that honors a famous 16th century samurai warrior. It's easy to mistake the Portuguese Man O'War () for a jellyfish .

andrew paul, one-eyed dragon, war species, (14 more...)

Popular Science

Country:

Asia > Japan > Honshū > Tōhoku > Miyagi Prefecture > Sendai (0.06)
South America > Chile (0.05)
Pacific Ocean (0.05)
(4 more...)

Genre: Research Report > New Finding (0.37)

Industry: Media > Photography (0.31)

Technology: Information Technology > Artificial Intelligence (0.51)

Add feedback

Yoshihiro Murai clinches sixth term as Miyagi governor

The Japan TimesOct-27-2025, 02:05:00 GMT

Yoshihiro Murai, 65, celebrates his victory in the Miyagi gubernatorial election on Sunday night. SENDAI - Yoshihiro Murai held off four other candidates to clinch his sixth term as governor of Miyagi Prefecture in Sunday's gubernatorial election. Murai, an independent candidate who had support from prefectural assembly members of the Liberal Democratic Party, Japan Innovation Party and Komeito, highlighted his achievements as the prefecture's governor spanning five terms, or 20 years. The 65-year-old former chief of the National Governors' Association pledged to enhance productivity by promoting digital transformation using generative artificial intelligence, in anticipation of a further population decline. He successfully fended off Masamune Wada, 51, also an independent candidate, who had been closing in.

governor, japantimes, yoshihiro murai clinch, (9 more...)

The Japan Times

Country:

North America > United States (1.00)
Asia > Japan > Honshū > Tōhoku > Miyagi Prefecture > Sendai (0.25)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.06)
(8 more...)

Industry:

Government > Voting & Elections (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence (0.91)
Information Technology > Communications > Social Media (0.77)

Add feedback

Man held in Japan on suspicion of creating female celeb deepfakes made with AI

The Japan TimesOct-16-2025, 08:09:00 GMT

Tokyo police believe the man made about 20,000 sexually explicit images of 262 women, such as actors and idols, and amassed sales of ¥1.2 million between October last year and September this year. Tokyo police have arrested a 31-year-old man for allegedly creating fake sexual images of female celebrities with generative artificial intelligence technology and displaying them online, it was learned Thursday. It is the first time that police in Japan have cracked down on sexual deepfake images of celebrities created with generative AI. The suspect, Hiroya Yokoi of the city of Akita, has admitted he began making deepfakes to earn a small amount of money, which he used to cover living expenses and repay a student loan. Authorities believe Yokoi made a total of about 20,000 sexually explicit images of 262 women, such as actors, television personalities and idols, and amassed sales of ¥1.2 million between October last year and September this year.

deepfake, female celeb deepfake, japan, (13 more...)

The Japan Times

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.48)
Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.06)
North America > United States (0.05)
(3 more...)

Industry:

Law (1.00)
Information Technology > Security & Privacy (0.99)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.58)

Add feedback

Japan's government asks OpenAI to seek permission amid Sora 2 copyright concerns

The Japan TimesOct-16-2025, 06:45:00 GMT

In a time of both misinformation and too much information, quality journalism is more crucial than ever. By subscribing, you can help us get the story right. With your current subscription plan you can comment on stories. However, before writing your first comment, please create a display name in the Profile section of your subscriber account page. Your subscription plan doesn't allow commenting.

government ask openai, japan, openai, (12 more...)

The Japan Times

Country: